Annotation models for crowdsourced ordinal data

نویسندگان

  • Vikas C. Raykar
  • Shipeng Yu
چکیده

In supervised learning when acquiring good quality labels is hard, practitioners resort to getting the data labeled by multiple noisy annotators. Various methods have been proposed to estimate the consensus labels for binary and categorical labels. A commonly used paradigm to annotate instances when the labels are inherently subjective is to use ordinal scales. In this paper we propose annotator models based on Receiver Operating Characteristic (ROC) curve analysis to consolidate the ordinal annotations from multiple annotators. The models lead to simple Expectation-Maximization (EM) algorithms that estimate both the consensus labels and annotator performance jointly. Experiments indicate that the proposed algorithm is superior to the commonly used majority voting rule.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. Annotation models for ordinal data have been proposed mostly as extensions of their binary/categorical cou...

متن کامل

Semantic Annotation Aggregation with Conditional Crowdsourcing Models and Word Embeddings

In modern text annotation projects, crowdsourced annotations are often aggregated using item response models or by majority vote. Recently, item response models enhanced with generative data models have been shown to yield substantial benefits over those with conditional or no data models. However, suitable generative data models do not exist for many tasks, such as semantic labeling tasks. Whe...

متن کامل

Experiments with crowdsourced re-annotation of a POS tagging data set

Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have largely assumed that syntactic tasks such as part-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually ann...

متن کامل

Transition Models for Analyzing Longitudinal Data with Bivariate Mixed Ordinal and Nominal Responses

In many longitudinal studies, nominal and ordinal mixed bivariate responses are measured. In these studies, the aim is to investigate the effects of explanatory variables on these time-related responses. A regression analysis for these types of data must allow for the correlation among responses during the time. To analyze such ordinal-nominal responses, using a proposed weighting approach, an ...

متن کامل

Automated Evaluation of Crowdsourced Annotations in the Cultural Heritage Domain

Cultural heritage institutions are employing crowdsourcing techniques to enrich their collection. However, assessing the quality of crowdsourced annotations is a challenge for these institutions and manually evaluating all annotations is not feasible. We employ Support Vector Machines and feature set selectors to understand which annotator and annotation properties are relevant to the annotatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011